Goto

Collaborating Authors

 property testing



A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Neural Information Processing Systems

The computational complexity of a property tester is the number of adjacency list entries it reads, denoted its queries . Many works in graph property testing focus on testing plain graphs that contain only the pure combinatorial information.


Efficient Testing Implies Structured Symmetry

arXiv.org Artificial Intelligence

Given a small random sample of $n$-bit strings labeled by an unknown Boolean function, which properties of this function can be tested computationally efficiently? We show an equivalence between properties that are efficiently testable from few samples and properties with structured symmetry, which depend only on the function's average values on parts of a low-complexity partition of the domain. Without the efficiency constraint, a similar characterization in terms of unstructured symmetry was obtained by Blais and Yoshida (2019). Our main technical tool is supersimulation, which builds on methods from the algorithmic fairness literature to approximate arbitrarily complex functions by small-circuit simulators that fool significantly larger distinguishers. We extend the characterization along other axes as well. We show that allowing parts to overlap exponentially reduces their required number, broadening the scope of the construction from properties testable with $O(\log n)$ samples to properties testable with $O(n)$ samples. For larger sample sizes, we show that any efficient tester is essentially checking for indistinguishability from a bounded collection of small circuits, in the spirit of a characterization of testable graph properties. Finally, we show that our results for Boolean function testing generalize to high-entropy distribution testing on arbitrary domains.


Testing classical properties from quantum data

arXiv.org Artificial Intelligence

Many properties of Boolean functions can be tested far more efficiently than the function can be learned. However, this advantage often disappears when testers are limited to random samples--a natural setting for data science--rather than queries. In this work we investigate the quantum version of this scenario: quantum algorithms that test properties of a function $f$ solely from quantum data in the form of copies of the function state for $f$. For three well-established properties, we show that the speedup lost when restricting classical testers to samples can be recovered by testers that use quantum data. For monotonicity testing, we give a quantum algorithm that uses $\tilde{\mathcal{O}}(n^2)$ function state copies as compared to the $2^{\Omega(\sqrt{n})}$ samples required classically. We also present $\mathcal{O}(1)$-copy testers for symmetry and triangle-freeness, comparing favorably to classical lower bounds of $\Omega(n^{1/4})$ and $\Omega(n)$ samples respectively. These algorithms are time-efficient and necessarily include techniques beyond the Fourier sampling approaches applied to earlier testing problems. These results make the case for a general study of the advantages afforded by quantum data for testing. We contribute to this project by complementing our upper bounds with a lower bound of $\Omega(1/\varepsilon)$ for monotonicity testing from quantum data in the proximity regime $\varepsilon\leq\mathcal{O}(n^{-3/2})$. This implies a strict separation between testing monotonicity from quantum data and from quantum queries--where $\tilde{\mathcal{O}}(n)$ queries suffice when $\varepsilon=\Theta(n^{-3/2})$. We also exhibit a testing problem that can be solved from $\mathcal{O}(1)$ classical queries but requires $\Omega(2^{n/2})$ function state copies, complementing a separation of the same magnitude in the opposite direction derived from the Forrelation problem.


Towards the methodology for solving the minimum enclosing ball and related problems

arXiv.org Artificial Intelligence

Methodology is provided towards the solution of the minimum enclosing ball problem. This problem concerns the determination of the unique spherical surface of smallest radius enclosing a given bounded set in the d-dimensional Euclidean space. Mathematical formulation and typical methods for solving this problem are presented. Also, the paper is focused on areas that are related to this problem, namely: (a) promise problems and property testing, (b) theorems for partitioning and enclosing (covering) a set, and (c) computation of the diameter of a set.


Hamiltonian Property Testing

arXiv.org Artificial Intelligence

Locality is a fundamental feature of many physical time evolutions. Assumptions on locality and related structural properties also underlie recently proposed procedures for learning an unknown Hamiltonian from access to the induced time evolution. However, no protocols to rigorously test whether an unknown Hamiltonian is local were known. We investigate Hamiltonian locality testing as a property testing problem, where the task is to determine whether an unknown $n$-qubit Hamiltonian $H$ is $k$-local or $\varepsilon$-far from all $k$-local Hamiltonians, given access to the time evolution along $H$. First, we emphasize the importance of the chosen distance measure: With respect to the operator norm, a worst-case distance measure, incoherent quantum locality testers require $\tilde{\Omega}(2^n)$ many time evolution queries and an expected total evolution time of $\tilde{\Omega}(2^n / \varepsilon)$, and even coherent testers need $\Omega(2^{n/2})$ many queries and $\Omega(2^{n/2}/\varepsilon)$ total evolution time. In contrast, when distances are measured according to the normalized Frobenius norm, corresponding to an average-case distance, we give a sample-, time-, and computationally efficient incoherent Hamiltonian locality testing algorithm based on randomized measurements. In fact, our procedure can be used to simultaneously test a wide class of Hamiltonian properties beyond locality. Finally, we prove that learning a general Hamiltonian remains exponentially hard with this average-case distance, thereby establishing an exponential separation between Hamiltonian testing and learning. Our work initiates the study of property testing for quantum Hamiltonians, demonstrating that a broad class of Hamiltonian properties is efficiently testable even with limited quantum capabilities, and positioning Hamiltonian testing as an independent area of research alongside Hamiltonian learning.


A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Neural Information Processing Systems

In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.


Active Tolerant Testing

arXiv.org Machine Learning

In this work, we give the first algorithms for tolerant testing of nontrivial classes in the active model: estimating the distance of a target function to a hypothesis class C with respect to some arbitrary distribution D, using only a small number of label queries to a polynomial-sized pool of unlabeled examples drawn from D. Specifically, we show that for the class D of unions of d intervals on the line, we can estimate the error rate of the best hypothesis in the class to an additive error epsilon from only $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$. The key point here is the number of labels needed is independent of the VC-dimension of the class. This extends the work of Balcan et al. [2012] who solved the non-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than epsilon). We also consider the related problem of estimating the performance of a given learning algorithm A in this setting. That is, given a large pool of unlabeled examples drawn from distribution D, can we, from only a few label queries, estimate how well A would perform if the entire dataset were labeled? We focus on k-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of k for the given learning problem).


Property Testing in High Dimensional Ising models

arXiv.org Machine Learning

This paper explores the information-theoretic limitations of graph property testing in zero-field Ising models. Instead of learning the entire graph structure, sometimes testing a basic graph property such as connectivity, cycle presence or maximum clique size is a more relevant and attainable objective. Since property testing is more fundamental than graph recovery, any necessary conditions for property testing imply corresponding conditions for graph recovery, while custom property tests can be statistically and/or computationally more efficient than graph recovery based algorithms. Understanding the statistical complexity of property testing requires the distinction of ferromagnetic (i.e., positive interactions only) and general Ising models. Using combinatorial constructs such as graph packing and strong monotonicity, we characterize how target properties affect the corresponding minimax upper and lower bounds within the realm of ferromagnets. On the other hand, by studying the detection of an antiferromagnetic (i.e., negative interactions only) Curie-Weiss model buried in Rademacher noise, we show that property testing is strictly more challenging over general Ising models. In terms of methodological development, we propose two types of correlation based tests: computationally efficient screening for ferromagnets, and score type tests for general models, including a fast cycle presence test. Our correlation screening tests match the information-theoretic bounds for property testing in ferromagnets.